AITopics | gradient analysis

Collaborating Authors

gradient analysis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Randomized Matrix Sketching for Neural Network Training and Gradient Monitoring

Antil, Harbir, Verma, Deepanshu

arXiv.org Artificial IntelligenceOct-2-2025

Neural network training relies on gradient computation through backpropagation, yet memory requirements for storing layer activations present significant scalability challenges. We present the first adaptation of control-theoretic matrix sketching to neural network layer activations, enabling memory-efficient gradient reconstruction in backpropagation. This work builds on recent matrix sketching frameworks for dynamic optimization problems, where similar state trajectory storage challenges motivate sketching techniques. Our approach sketches layer activations using three complementary sketch matrices maintained through exponential moving averages (EMA) with adaptive rank adjustment, automatically balancing memory efficiency against approximation quality. Empirical evaluation on MNIST, CIFAR-10, and physics-informed neural networks demonstrates a controllable accuracy-memory tradeoff. We demonstrate a gradient monitoring application on MNIST showing how sketched activations enable real-time gradient norm tracking with minimal memory overhead. These results establish that sketched activation storage provides a viable path toward memory-efficient neural network training and analysis.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Artificial Intelligence

2510.00442

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models

Tuan, Yi-Lin, Wang, William Yang

arXiv.org Artificial IntelligenceAug-29-2024

Beyond maximum likelihood estimation (MLE), the standard objective of a language model (LM) that optimizes good examples probabilities, many studies have explored ways that also penalize bad examples for enhancing the quality of output distribution, including unlikelihood training, exponential maximizing average treatment effect (ExMATE), and direct preference optimization (DPO). To systematically compare these methods and further provide a unified recipe for LM optimization, in this paper, we present a unique angle of gradient analysis of loss functions that simultaneously reward good examples and penalize bad ones in LMs. Through both mathematical results and experiments on CausalDialogue and Anthropic HH-RLHF datasets, we identify distinct functional characteristics among these methods. We find that ExMATE serves as a superior surrogate for MLE, and that combining DPO with ExMATE instead of MLE further enhances both the statistical (5-7%) and generative (+18% win rate) performance.

gradient, information difference, probability, (15 more...)

arXiv.org Artificial Intelligence

2408.16751

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Add feedback

Outlier Gradient Analysis: Efficiently Improving Deep Learning Model Performance via Hessian-Free Influence Functions

Chhabra, Anshuman, Li, Bo, Chen, Jian, Mohapatra, Prasant, Liu, Hongfu

arXiv.org Artificial IntelligenceMay-12-2024

Data-centric learning focuses on enhancing algorithmic performance from the perspective of the training data [Oala et al., 2023]. In contrast to model-centric learning, which designs novel algorithms or optimization techniques for performance improvement with fixed training data, data-centric learning operates with a fixed learning algorithm while modifying the training data through trimming, augmenting, or other methods aligned with improving utility [Zha et al., 2023]. Data-centric learning holds significant potential in many areas such as model interpretation, subset training set selection, data generation, noisy label detection, active learning, and others [Chhabra et al., 2024, Kwon et al., 2024]. The essence of data-centric learning lies in estimating data influence, also known as data valuation [Hammoudeh and Lowd, 2022], in the context of a learning task. Intuitively, the impact of an individual data sample can be measured by assessing the change in learning utility when training with and without that specific sample. This leave-one-out influence [Cook and Weisberg, 1982] provides a rough gauge of the relative data influence of the specific sample on the otherwise full fixed training set. On the other hand, Shapley value [Ghorbani and Zou, 2019, Jia et al., 2019], originating from cooperative game theory, quantifies the increase in value when a group of samples collaborates to achieve the learning goal. Unlike leave-one-out influence, Shapley value represents the weighted average utility change resulting from adding the point to different training subsets. Despite the absence of assumptions on the learning model, the aforementioned retraining-based methods incur significant computational costs, especially for large-scale data analysis and deep models [Hammoudeh and Lowd, 2022].

dataset, gradient analysis, outlier gradient analysis, (11 more...)

arXiv.org Artificial Intelligence

2405.03869

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > France (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
(3 more...)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Comparing interpretation methods in mental state decoding analyses with deep learning models

Thomas, Armin W., Ré, Christopher, Poldrack, Russell A.

arXiv.org Artificial IntelligenceOct-14-2022

Deep learning (DL) models find increasing application in mental state decoding, where researchers seek to understand the mapping between mental states (e.g., perceiving fear or joy) and brain activity by identifying those brain regions (and networks) whose activity allows to accurately identify (i.e., decode) these states. Once a DL model has been trained to accurately decode a set of mental states, neuroimaging researchers often make use of interpretation methods from explainable artificial intelligence research to understand the model's learned mappings between mental states and brain activity. Here, we compare the explanation performance of prominent interpretation methods in a mental state decoding analysis of three functional Magnetic Resonance Imaging (fMRI) datasets. Our findings demonstrate a gradient between two key characteristics of an explanation in mental state decoding, namely, its biological plausibility and faithfulness: interpretation methods with high explanation faithfulness, which capture the model's decision process well, generally provide explanations that are biologically less plausible than the explanations of interpretation methods with less explanation faithfulness. Based on this finding, we provide specific recommendations for the application of interpretation methods in mental state decoding.

artificial intelligence, machine learning, mental state, (17 more...)

arXiv.org Artificial Intelligence

2205.15581

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback